Skip to main content
Version: 25.09

Custom Content Identifier Rules

The Content matching rules page under Preferences provides a unified interface for you to define content inspection rules.

Creating Custom Rules

To create a custom rule,

  1. On the Rules page, click on Custom Rule.
  2. In the Create New Rule pop-up window, enter a name, and description.
  3. In the XML Rule editor, write the rule to define the matching criteria. See, Rule Editor to understand the rule elements.
  4. Click Save.

Rule Editor

Using the XML-based rule editor, you can create custom rules that define pattern elements and specific criteria used to match content.

The rule consists of the following two sections.

  • Entities
  • Keywords and Regex

Here is a breakdown of the rule.

Entity

The Entity object contains the attributes and pattern definitions that specify the content-matching criteria. An entity must contain at least one pattern definition

<Rules>  
<Entity patternsProximity="100" recommendedConfidence="85">
<Pattern confidenceLevel="85">
<IdMatch idRef="num_string">
<Match idRef="words" minOccurs="1" />
</Pattern>
</Entity>
  <!-- Entity is required and must contain at least one Pattern child -->  
<!-- patternsProximity: The number of characters that the Entity's child/inner Pattern elements will consider surrounding content as corroborative evidence. Must be 1 or more. -->
<!-- recommendedConfidence: The confidence threshold (1 - 100) to consider the Entity as matching -->
<!-- confidenceLevel: The confidence (1 - 100) that this Pattern contributes to the parent Entity's overall confidence level if it matches. -->
<!-- idRef is required and should reference Keyword id or Regex id -->
<!-- The match won't be included in the match results; instead, it serves as supplementary evidence -->

Entity Elements

ElementSub ElementDescription
patternsProximityThe range of characters around the content to support the pattern identification. For example, identify patterns where a company name is found within a proximity of 100 characters from a numerical pattern such as 9999.99.
recommendedConfidenceThe minimum threshold percentage that the combined "confidenceLevel" of all the patterns must meet or exceed to be considered a match. For example, if the recommendedConfidence is 85%, then the combined confidenceLevel of all the patterns must be equal to or greater than 85% to identify the pattern found in the content as a match. Depending on how critical it is to correctly identify the pattern, you can assign a higher recommendedConfidence value.
Pattern
confidenceLevelAn estimated percentage value assigned based on the reliability of the pattern. For example, the confidence level of identifying a numerical value such as "9999.9999" is higher when compared to a word such as "transaction" in the right context
IdMatchvariable used to reference the primary keyword or regular expression that should be matched.
MatchReferences a supplementary keyword or regular expression to provide supporting evidence and increase confidence in pattern identification.

Keywords and Regex

In the second section of the rule, you define the primary and supplementary keywords and regular expressions to be matched.

  <Keyword id="words"\> 
<Group matchStyle="word">
<Term>google</Term>
<Term>facebook</Term>
</Group>
</Keyword>
<Regex id="phone_number">d{10}</Regex>
<Keyword id="num_string">
<Group matchStyle="numerical_string">
<Term>9999.9999</Term>
</Group\>
</Keyword>
</Rules>

Keyword & Regex Elements

ElementDescription
Keyword
Keyword idA unique ID for this keyword group. For example, "words", "num_string", etc. These must not repeat in a rule
Group
Group matchStyleSpecifies the type of string to be matched. For example, "letter", "word", "phrase", "symbol", "numerical_string", etc. Term: Individual terms within the keyword group that should be matched.
Group TermOne or more terms that are intended to be matched in the keywword match
Regex idA unique ID for this regular expression.
RegexThe regular expression that defines a content pattern. For example, "d10" is a regular expression to match a sequence of 10 digits such as a phone number.

Example 1

In this example, we created a rule to identify a specific transaction involving a company name such as "Google" or "Facebook" and a transaction value of "9999.9999". In this rule, the numerical value is the primary keyword, and the company name is a supplementary keyword.


<Rules>

\<\!-- Entity is required and must contain at least one Pattern child \--\>

\<\!-- patternsProximity: The number of characters that the Entity's child/inner Pattern elements will consider surrounding content as corroborative evidence. Must be 1 or more. \--\>

\<\!-- recommendedConfidence: The confidence threshold (1 \- 100\) to consider the Entity as matching \--\>

\<Entity patternsProximity="100" recommendedConfidence="85"\>

\<\!-- confidenceLevel: The confidence (1 \- 100\) that this Pattern contributes to the parent Entity's overall confidence level if it matches. \--\>

\<Pattern confidenceLevel="85"\>

\<\!-- idRef is required and should reference Keyword id or Regex id \--\>

\<IdMatch idRef="num\_string" /\>

\<\!-- The match won't be included in the match results;

instead, it serves as supplementary evidence \--\>

\<Match idRef="words" minOccurs="1" /\>

\</Pattern\>

\</Entity\>

\<\!-- The 'id' should be unique per rule; the IDs of both Keywords and Regex should not repeat. \--\>

\<Keyword id="words"\>

\<Group matchStyle="word"\>

\<\!-- Match either of these strings \--\>

\<Term\>google\</Term\>

\<Term\>facebook\</Term\>

\</Group\>

\</Keyword\>

\<\!-- Perl regular expression syntax \--\>

\<Regex id="phone\_number"\>\\d{10}\</Regex\>





\<Keyword id="num\_string"\>

\<Group matchStyle="numerical\_string"\>

\<Term\>9999.9999\</Term\>

\</Group\>

\</Keyword\>

\</Rules\>

Example 2

In this example, we created a rule to use regular expressions to identify keyword patterns to match specific user names and email addresses.

| XML | Copy |
| :---- | :---: |
| \<Rules\> \<\!-- Entity for detecting specific keywords or email addresses \- \-\> \<Entity id="Keyword\_Detection" patternsProximity="100" recommendedConfidence="85"\> \<\!-- Pattern to detect specific keywords or email addresses \--\> \<Pattern confidenceLevel="90"\> \<IdMatch idRef="regex\_keywords" /\> \</Pattern\> \</Entity\> \<\!-- Regex for matching specific keywords or email addresses \--\> \<Regex id="regex\_keywords"\> \\b(?:JDoe|john\\.doe@acme\\.com|JSmith|jane@acme\\.com|MBrown|michael\\. brown@acme\\.com)\\b \</Regex\> \</Rules\> | |